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Sumbel Codes ? 


Ar Дума о Coding 
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Practical data compression 


Claims: 1. The Shannon information content of an outcome 
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is a sensible measure of information content. 


2. The entropy 


H(X) = Y P(z)log, Pi 


is a sensible measure of expected information 
content. 


3. Source coding theorem - 
N outcomes from a source X can be compressed 
into roughly N H (X) bits. 


Symbol codes 
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Last time 


(Symbol codes 
“unique decodeability — requires Yum <1 Kraft inequality ; : : 
[if equality holds, the code is ‘complete’] ` 3 ” 
“prefix codes А : as 
» and binary trees E : n 


^ whenever a code achieved L=H, it had 


codelengths / equal to the information contents [7 
1| = log 1 
Pi 
optimal symbol codes 
The optimal symbol code's expected length L satisfies 
H(X)<L<H(X)+1 


‘Huffman algorithm 


What is the optimal symbol code for 


Ensemble X Ensemble Y 
5 1 а 1/7 
5 b 2/Z 
b 3 с 3/2 
1 а 4/7 
© 5 е 5/7 
4 1 f 6/7 
5 5 7/7 

е һ 8/7 
5 i 9/2 

j 10/7? 


01 


110 


The total symbol code budget 


Prefix codes suffice 


a Pi log eas) 

a 0.0575 41 4 0000 

b 0.0128 6.3 6 001000 

с 0.0263 52 5 00101 

4 0.0285 54 5 10000 

е 0.0913 35 4 1100 

f 0.0173 5.9 6 111000 

g 0.0133 62 6 001001 

h 0.0313 5.0 5 10001 

i 0.0599 41 4 1001 

j 0.0006 10.7 10 1101000000 
к 0.0084 6.9 7 1010000 

1 0.0335 49 5 11101 

m 0.0235 5.4 6 110101 

n 0.0596 4.1 4 0001 

o 0.0689 39 4 1011 

р 0.0192 57 6 111001 

q 0.0008 10.3 9 110100001 
r 0.0508 43 5 11011 

s 0.0567 41 4 0011 

& 0.0706 38 4 1111 

u 0.0334 49 5 10101 

v 0.0069 7.2 8 11010001 
w 0.0119 6.4 7 1101001 

x 0.0073 та 7 1010001 

y 0.0164 5.9 6 101001 

z 0.0007 10.4 10 1101000001 
— 0.1928 24 2 01 


Symbol code summary 


^ Gibbs inequality 
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(Symbol codes А Man 
“unique decodeability — requires by 27* <1 Kraft inequality o0 


“prefix codes [if equality holds, the code is ‘complete’] E 
» and binary trees 
optimal symbol codes Ж 
The ideal codelengths (7 are the information contents 
ыы 1 
UF = log » 
The optimal symbol code's expected length L satisfies 
H(X) <L<H(X)+1 
Huffman algorithm 


Today 


The ideal codelengths (7 are the information contents 


H 
UF = log — 
i E 


The optimal symbol code's expected length L satisfies 
H(X)<L<H(X)+1 


(ODoes that wrap up compression? 
“What's wrong with optimal symbol codes? 


(OArithmetic coding 


The Guessing Game 


Headline composed of 
(A, B, C, D, E, F, G, H, I, J, K, L, M, ..., Z, -) 
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Realistic compression 


Optimal symbol code for 


a 0.001 00000 

b 0.001 00001 

c 0.990 1 

d 0.001 00010 

e 0.001 00011 

f 0.001 0100 

g 0.001 0101 

h 0.001 0110 

i 0.001 0111 

j 0.001 0010 

k 0.001 0011 
expected length 1.034 
entropy 0.11401 


length / entropy E 
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Arithmetic coding 
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Other uses for arithmetic coding 


(Efficient writing 
Compression: 
Text — Bit string 
(preferably short) 
Writing: 


Text <— Gesture 
(preferably brief) 


Dasher 


www.inference.phy.cam.ac.uk/dasher/ 


Q: Tf we encode symbols from the ensemble 
Ax = {a,b,c,d} 
Px = {1/2,1/4,1/8, 1/8} 
using the symbol code 
C = (0, 10, 110, 111), 


what is the probability "рі! 
that a bit plucked at random from the encoded stream 
is a 1? 
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Recommended exercises 


Project: 


Invent a compressor and uncompressor for a 

source file of N — 10,000 bits, each having probability 
[= 0.01 of being a 1. 

Implement them and/or 

estimate how well your method works. 


@Huffman programs huffman.p, huffman.py are on website 
“also a ‘bent coin' file 0010000... as a compression benchmark 


Also exercises 5.22, 5.26, 5.27 


(Reading: Chapters 1, 2, 4, 5. And 6. 
(Advance reading: Chapters 8, 9, 10 


